Logic Program Induction using MDL and MAP: An Application to Grammars
نویسنده
چکیده
Probabilistic programs provide an appealing language for describing mental theories, because they are Turing complete: any computable process may be described as a program. Program induction is the problem of inferring theories, in the form of (probabilistic) programs, that describe some set of observations. Minimum Description Length, or MDL, is one common approach to program induction [11]. The MDL approach selects the hypothesis (program) such that the sum of the length of the program, along with the length of the data (observations), when encoded with the help of the program, is minimized. Exactly how the data is encoded depends upon the hypothesis; when MDL is used for program induction, the encoding of the data is typically some sort of certificate proving that the program outputs the observations. Instead of MDL, one may also take a Bayesian approach to program induction. This approach involves placing a prior upon programs and calculating, for a given program, the likelihood of the observations. Typically the prior penalizes longer programs [7, pg 385]. In many situations, such as polynomial curve fitting, the MDL and Bayesian approaches coincide: the MAP hypothesis and the hypothesis with minimal description length are the same [7, pg 392]. In this project, I explore the problem of evaluating candidate programs to explain some number of observations, using both the MDL and MAP criteria. I focus on grammar induction: inferring the process (probabilistic program) that generates phrases found within a corpus. Both MDL and MAP are used
منابع مشابه
Alternating Regular Tree Grammars in the Framework of Lattice-Valued Logic
In this paper, two different ways of introducing alternation for lattice-valued (referred to as {L}valued) regular tree grammars and {L}valued top-down tree automata are compared. One is the way which defines the alternating regular tree grammar, i.e., alternation is governed by the non-terminals of the grammar and the other is the way which combines state with alternation. The first way is ta...
متن کاملMDL-Based Context-Free Graph Grammar Induction
We present an algorithm for the inference of context-free graph grammars from examples. The algorithm builds on an earlier system for frequent substructure discovery, and is biased toward grammars that minimize description length. Grammar features include recursion, variables and relationships. We present an illustrative example, demonstrate the algorithms ability to learn in the presence of n...
متن کاملBayesian Induction of Bracketing Inversion Transduction Grammars
We present a novel approach to learning phrasal inversion transduction grammars via Bayesian MAP (maximum a posteriori) or information-theoretic MDL (minimum description length) model optimization so as to incorporate simultaneously the choices of model structure as well as parameters. In comparison to most current SMT approaches, the model learns phrase translation lexicons that (a) do not req...
متن کاملInductive Program Synthesis as Induction of Context - Free Tree Grammars
We present an application of grammar induction in the domain of inductive program synthesis. Synthesis of recursive programs from input/output examples involves the solution of two subproblems: transforming examples into straightforward programs and folding straightforward programs into (a set of) recursive equations. In this paper we focus on the second part of the synthesis problem, which cor...
متن کاملUnsupervised Grammar Inference Using the Minimum Description Length Principle
Context Free Grammars (CFGs) are widely used in programming language descriptions, natural language processing, compilers, and other areas of software engineering where there is a need for describing the syntactic structures of programs. Grammar inference (GI) is the induction of CFGs from sample programs and is a challenging problem. We describe an unsupervised GI approach which uses simplicit...
متن کامل